Nixiesearch: running Lucene over S3, and why we are building our own serverless search engine

Roman Grebennikov • Location: TUECHTIG • Back to Haystack EU 2024

Is your search cluster stuck in ‘status: red’ due to over-complicated maintenance? Are modern vector databases still falling short in solving your real-world search problems? You’re not alone. Companies like Uber, Doordash, Amazon, and Yelp have turned to running their search backends on Lucene over S3 for its simplicity and reliability.

We’re introducing Nixiesearch, an open source Lucene-based search engine designed for operational simplicity—where nodes are stateless and all index data is stored on S3. Nixiesearch offers the full range of familiar Lucene features (such as filters, facets and suggestions), while also having “AI batteries included”, such as RAG and text+image embeddings handled by the engine itself.

In this talk, we’ll dive into the design trade-offs we made between simplicity and complexity, and explore why Nixiesearch might (or might not) be a good fit for your search needs.

Roman Grebennikov

A principal ML engineer and an ex startup CTO working on modern search and recommendations problems. A pragmatic fan of open-source software, functional programming, LLMs and performance engineering.